-
Notifications
You must be signed in to change notification settings - Fork 3
Add tests for std::[unordered_][multi]set
#39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left some comments, mostly we should start by figuring out what we want to test for the binary format in particular. Originally I considered being lazy for other containers and put all std::[unordered_][multi]set
in a single test, without the full combination of index column types. But I think the multi
containers make this awkward because there we actually want to test that the duplicates are preserved, and also we may want to be 100% certain that we get the index columns right. For the non-multi
containers though, duplicates are essentially handled on the C++ side, before it comes to RNTuple I think, so not sure if we need those entries here...
Yeah that's fair enough! I suppose we could squash [unordered_]set and [unordered_]multiset. One thing to consider is that it's not a given that every writer/reader will be written in C++ (we already know this isn't the case), but the spec explicitly refers to C++ types. But you're right that still the ordering and duplicate handling is not something that the format itself is responsible for. Taking it even further, the specification explicitly states that the on-disk representation is identical to |
That might be a good compromise because we probably want the same input entries / entry classes for
Yes, in my opinion we want each supported C++ type to appear at least once in the validation suite. But indeed, the question is how much different is it from a binary format perspective. There's two axes to that: index column encoding and nesting (e.g. |
As a side note, in roottest we do have a test of reading each STL collections in file format into all other STL collections of the same content (for a large subset of cases) |
After thinking this over, here are some more considerations:
The decision whether to merge |
Based on this statement:
and because I right now don't have a clear idea how to nicely merge the ordered and unordered variants (two field types instead of one, i.e., |
Thanks for the work of also updating the infrastructure and the CI, seems to pass 😃
Fine with me.
I would tend to remove the "duplicate" and "reverse" tests because at least I personally find it confusing that the output doesn't match the input, and it's actually not RNTuple that is responsible for that. Let's hear some opinions from others maybe? |
I agree that those tests would not be adding much value. On the other hand, see my question/pondering on |
While typing this, I realized that the same reasoning could be applied to all the Edit: and the Index column encoding... |
So, what would your proposal be? I mean, fundamentally they are all identical to |
This indeed has the 'nice' side effect that (except for map/multimap which contains pairs) the json files would be almost identical (i.e. just the (type)names would be different, isn't it) |
Not quite, since (a) sets don't preserve their order and (b) we don't store duplicate elements. We could make the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from two comments that apply to all nested tests.
I would have previously argued that we don't need nesting to have the C++ type once, a fundamental item type could be sufficient. But the discussions convinced me that this is useful to have the nesting since it directly affects the
I don't have a really good proposal. The best I could come up with after some thinking is effectively documenting the reasoning behind the current tests, and just generalize based on that: #50 Let me know what you think. |
...for the tests that need them (currently, all nested `std::set` and friends).
Closes #14.